Goto

Collaborating Authors

 rtx 2080


Supervising the Transfer of Reasoning Patterns in VQA

Neural Information Processing Systems

We proceed in the lines of the proof of Theorem 5.1 in [ Given a set of i.i.d data samples For first introduce some notation. From the proof of Theorem 5.1 in [11], we also know that H This finishes the proof for the case p = 1 . Let us consider the case of p = 2 l +1. This finishes the proof for the case p = 2 l +1. We provide more details on the program decoder architecture.


Slim Scheduler: A Runtime-Aware RL and Scheduler System for Efficient CNN Inference

Harshbarger, Ian, Chidambaram, Calvin

arXiv.org Artificial Intelligence

Most neural network scheduling research focuses on optimizing static, end-to-end models of fixed width, overlooking dynamic approaches that adapt to heterogeneous hardware and fluctuating runtime conditions. We present Slim Scheduler, a hybrid scheduling framework that integrates a Proximal Policy Optimization (PPO) reinforcement learning policy with algorithmic, greedy schedulers to coordinate distributed inference for slimmable models. Each server runs a local greedy scheduler that batches compatible requests and manages instance scaling based on VRAM and utilization constraints, while the PPO router learns global routing policies for device selection, width ratio, and batch configuration. This hierarchical design reduces search space complexity, mitigates overfitting to specific hardware, and balances efficiency and throughput. Compared to a purely randomized task distribution baseline, Slim Scheduler can achieve various accuracy and latency trade-offs such as: A 96.45% reduction in mean latency and a 97.31% reduction in energy usage dropping accuracy to the slimmest model available (70.3%). It can then accomplish an overall reduction in average latency plus energy consumption with an increase in accuracy at the cost of higher standard deviations of said latency and energy, effecting overall task throughput.



Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of Transformers

Sampath, Aneesha, Tavernor, James, Provost, Emily Mower

arXiv.org Artificial Intelligence

Accurate speech emotion recognition is essential for developing human-facing systems. Recent advancements have included finetuning large, pretrained transformer models like Wav2Vec 2.0. However, the finetuning process requires substantial computational resources, including high-memory GPUs and significant processing time. As the demand for accurate emotion recognition continues to grow, efficient finetuning approaches are needed to reduce the computational burden. Our study focuses on dimensional emotion recognition, predicting attributes such as activation (calm to excited) and valence (negative to positive). We present various finetuning techniques, including full finetuning, partial finetuning of transformer layers, finetuning with mixed precision, partial finetuning with caching, and low-rank adaptation (LoRA) on the Wav2Vec 2.0 base model. We find that partial finetuning with mixed precision achieves performance comparable to full finetuning while increasing training speed by 67%. Caching intermediate representations further boosts efficiency, yielding an 88% speedup and a 71% reduction in learnable parameters. We recommend finetuning the final three transformer layers in mixed precision to balance performance and training efficiency, and adding intermediate representation caching for optimal speed with minimal performance trade-offs. These findings lower the barriers to finetuning speech emotion recognition systems, making accurate emotion recognition more accessible to a broader range of researchers and practitioners.


Efficient Motion Prediction: A Lightweight & Accurate Trajectory Prediction Model With Fast Training and Inference Speed

Prutsch, Alexander, Bischof, Horst, Possegger, Horst

arXiv.org Artificial Intelligence

For efficient and safe autonomous driving, it is essential that autonomous vehicles can predict the motion of other traffic agents. While highly accurate, current motion prediction models often impose significant challenges in terms of training resource requirements and deployment on embedded hardware. We propose a new efficient motion prediction model, which achieves highly competitive benchmark results while training only a few hours on a single GPU. Due to our lightweight architectural choices and the focus on reducing the required training resources, our model can easily be applied to custom datasets. Furthermore, its low inference latency makes it particularly suitable for deployment in autonomous applications with limited computing resources.


Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming

Kim, Jinuk, Jeong, Yeonwoo, Lee, Deokjae, Song, Hyun Oh

arXiv.org Artificial Intelligence

Recent works on neural network pruning advocate that reducing the depth of the network is more effective in reducing run-time memory usage and accelerating inference latency than reducing the width of the network through channel pruning. In this regard, some recent works propose depth compression algorithms that merge convolution layers. However, the existing algorithms have a constricted search space and rely on human-engineered heuristics. In this paper, we propose a novel depth compression algorithm which targets general convolution operations. We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency. Since the proposed subset selection problem is NP-hard, we formulate a surrogate optimization problem that can be solved exactly via two-stage dynamic programming within a few seconds. We evaluate our methods and baselines by TensorRT for a fair inference latency comparison. Our method outperforms the baseline method with higher accuracy and faster inference speed in MobileNetV2 on the ImageNet dataset. Specifically, we achieve $1.41\times$ speed-up with $0.11$\%p accuracy gain in MobileNetV2-1.0 on the ImageNet.


How weak is YOUR password? Graphic shows exactly how long it would take hackers to break it

Daily Mail - Science & tech

As tedious as the incessant requests are for longer and harder-to-remember passwords, experts say there's good reason for the nuisance. It's gotten easier and easier for hackers to guess your password as computer processing speeds have gotten faster. With sprawling cloud-based computer power now available for rent to anyone -- and massive supercomputers out there, like the system that trained ChatGPT -- cyber security firm Hive Systems says that a truly professional hacker could access your secrets almost instantly. The company has produced a new table showing just how safe or vulnerable your password is, based on its character count and the diversity of characters you've used. They say you'll need a fully random password, that's at least 12-characters long, with a mixture of numbers, special symbols, upper- and lowercase letters, if you want to keep even just an amateur hacker out of your account, thanks to the power of today's consumer desktop tech.


Top GPUs For Deep Learning and Machine Learning in 2022

#artificialintelligence

As we walk into the age of AI, there is an exponential rise in the demand for GPU. The not-so-old method of parallel computing is applied to process computations in GPUs. Moreover, with the availability of very high numbers of ALUs or processing units, GPUs have become very suitable for powerful computations in AI. Furthermore, with the recent advent of Deep Learning in the current decade, most of the Deep Learning frameworks, including vastly popular TensorFlow, Pytorch, Theano, etc., enable advanced optimization of computations with GPU. Currently, a vast number of GPUs are available, with many differences in their features, like no. of processing units, memory capacity, clock frequency, etc.


Tested: 5 key things to know about Nvidia's GeForce RTX 3090

PCWorld

Nvidia's GeForce RTX 3090 delivers exhilarating graphics prowess--the fastest possible gaming frame rates at extreme resolutions, and outstanding performance in professional applications. At $1,500, it's either a hard pass or a no-brainer, depending on how you plan to use it. You can read every nitty-gritty detail in our comprehensive review of Nvidia's GeForce RTX 3090 Founders Edition. But if you don't feel like sifting through thousands of words of technical and testing details, here are the five key things you need to know. Yes, the GeForce RTX 3090 offers the "ultimate gaming experience" that Nvidia promised.


Nvidia GeForce RTX 3080 Founders Edition review: Staggeringly powerful

PCWorld

Nvidia's GeForce RTX 3080 graphics card symbolizes why we tell people to wait for the second generation when bleeding-edge technology appears. The radical new-look Turing GPUs inside Nvidia's GeForce RTX 20-series packed all sorts of cutting-edge technologies designed to usher in real-time ray tracing, a long sought-after goal for the gaming industry. Not only did Turing introduce specialized RT cores devoted to processing ray tracing tasks, it also debuted tensor cores, dedicated hardware that uses machine learning to help denoise ray traced visuals and enable AI-enhanced tools like the fantastic Deep Learning Super Sampling (DLSS) technology. Turing's improvements also extended to the traditional shader cores, introducing an overhauled processing pipeline better equipped to handle games built using the newer DirectX 12 and Vulkan graphics APIs. All of these were huge departures from the norm.